Use AutoAI and Lale to predict credit risk with ibm-watsonx-ai¶
This notebook contains the steps and code to demonstrate support of AutoAI experiments in watsonx.ai service. It introduces commands for data retrieval, training experiments, persisting pipelines, testing pipelines, refining pipelines, and scoring.
Some familiarity with Python is helpful. This notebook uses Python 3.12.
Learning goals¶
The learning goals of this notebook are:
- Work with watsonx.ai experiments to train AutoAI models.
- Compare trained models quality and select the best one for further refinement.
- Refine the best model and test new variations.
- Perform online deployment and score the trained model.
Contents¶
This notebook contains the following parts:
1. Set up the environment¶
Before you use the sample code in this notebook, contact with your Cloud Pak for Data administrator and ask for your account credentials.
%pip install -U wget | tail -n 1
%pip install -U nbformat | tail -n 1
%pip install -U autoai-libs | tail -n 1
%pip install -U lale | tail -n 1
%pip install "scikit-learn==1.6.1" | tail -n 1
%pip install -U ibm-watsonx-ai | tail -n 1
Successfully installed wget-3.2 Successfully installed fastjsonschema-2.21.1 nbformat-5.10.4 Successfully installed autoai-libs-3.0.3 Requirement already satisfied: sortedcontainers~=2.2 in /opt/user-env/pyt6/lib64/python3.12/site-packages (from portion->jsonsubschema>=0.0.6->lale) (2.4.0) Requirement already satisfied: threadpoolctl>=3.1.0 in /opt/user-env/pyt6/lib64/python3.12/site-packages (from scikit-learn==1.6.1) (3.6.0) Successfully installed ibm-watsonx-ai-1.3.20
Define credentials¶
Authenticate the watsonx.ai Runtime service on IBM Cloud Pak for Data. You need to provide the admin's username and the platform url.
username = "PASTE YOUR USERNAME HERE"
url = "PASTE THE PLATFORM URL HERE"
Use the admin's api_key to authenticate watsonx.ai Runtime services:
import getpass
from ibm_watsonx_ai import Credentials
credentials = Credentials(
username=username,
api_key=getpass.getpass("Enter your watsonx.ai API key and hit enter: "),
url=url,
instance_id="openshift",
version="5.2",
)
Alternatively you can use the admin's password:
import getpass
from ibm_watsonx_ai import Credentials
if "credentials" not in locals() or not credentials.api_key:
credentials = Credentials(
username=username,
password=getpass.getpass("Enter your watsonx.ai password and hit enter: "),
url=url,
instance_id="openshift",
version="5.2",
)
Enter your watsonx.ai password and hit enter: ········
Create APIClient instance¶
from ibm_watsonx_ai import APIClient
client = APIClient(credentials)
Working with spaces¶
First of all, you need to create a space that will be used for your work. If you do not have space already created, you can use {PLATFORM_URL}/ml-runtime/spaces?context=icp4data to create one.
- Click New Deployment Space
- Create an empty space
- Go to space
Settingstab - Copy
space_idand paste it below
Tip: You can also use SDK to prepare the space for your work. More information can be found here.
Action: Assign space ID below
space_id = "PASTE YOUR SPACE ID HERE"
You can use the list method to print all existing spaces.
client.spaces.list(limit=10)
To be able to interact with all resources available in watsonx.ai, you need to set the space which you will be using.
client.set.default_space(space_id)
'SUCCESS'
2. Optimizer definition¶
Connection configuration¶
Credentials for database should be passed as a python dictionary.
For the vast number of supported datasets the credentials should follow the bellow pattern.
Warning: Database name should be slected from the list of all the supported databases, in order to look it up use client.connections.list_datasource_types()
Warning: Input table should be uploaded in database under the location /schema_name/table_name.
table_name = "CREDIT_RISK"
db_name = "PASTE YOUR DATA SOURCE DATABASE NAME HERE" # for example: "db2"
schema_name = "PASTE YOUR SCHEMA NAME HERE"
db_credentials = {
"host": "PASTE YOUR DATABASE HOST HERE",
"port": "PASTE YOUR DATABASE PORT HERE",
"database": "PASTE YOUR DATABASE NAME HERE",
"username": "PASTE YOUR DATABASE USER NAME HERE",
"password": "PASTE YOUR DATABASE USER PASSWORD HERE",
"ssl": True, # set to False if ssl is disabled
}
if db_credentials["ssl"]:
db_credentials["ssl_certificate"] = "PASTE YOUR DATABASE SSL CERTIFICATE HERE"
Create connection¶
data_source_type_id = client.connections.get_datasource_type_id_by_name(db_name)
conn_meta_props = {
client.connections.ConfigurationMetaNames.NAME: f"Connection to Database - {db_name} ",
client.connections.ConfigurationMetaNames.DATASOURCE_TYPE: data_source_type_id,
client.connections.ConfigurationMetaNames.DESCRIPTION: "Connection to external Database",
client.connections.ConfigurationMetaNames.PROPERTIES: db_credentials,
}
conn_details = client.connections.create(meta_props=conn_meta_props)
Creating connections... SUCCESS
connection_id = client.connections.get_id(conn_details)
Download training data¶
import os
import wget
filename = "german_credit_data_biased_training.csv"
if not os.path.isfile(filename):
filename = wget.download(
"https://raw.githubusercontent.com/IBM/watsonx-ai-samples/master/cpd5.2/data/credit_risk/german_credit_data_biased_training.csv",
)
Create connection asset¶
import pandas as pd
from ibm_watsonx_ai.helpers import DataConnection, DatabaseLocation
credit_risk_conn = DataConnection(
connection_asset_id=connection_id,
location=DatabaseLocation(schema_name=schema_name, table_name=table_name),
)
credit_risk_conn._api_client = client
credit_risk_conn.write(pd.read_csv(filename))
training_data_reference = [credit_risk_conn]
Optimizer configuration¶
Provide the input information for AutoAI optimizer:
name- experiment nameprediction_type- type of the problemprediction_column- target column namescoring- optimization metric
from ibm_watsonx_ai.experiment import AutoAI
experiment = AutoAI(credentials, space_id=space_id)
pipeline_optimizer = experiment.optimizer(
name="Credit Risk Prediction - AutoAI",
desc="Sample notebook",
prediction_type=AutoAI.PredictionType.BINARY,
prediction_column="Risk",
scoring=AutoAI.Metrics.ROC_AUC_SCORE,
)
Configuration parameters can be retrieved via get_params().
pipeline_optimizer.get_params()
{'name': 'Credit Risk Prediction - AutoAI',
'desc': 'Sample notebook',
'prediction_type': 'binary',
'prediction_column': 'Risk',
'prediction_columns': None,
'timestamp_column_name': None,
'scoring': 'roc_auc',
'holdout_size': None,
'max_num_daub_ensembles': None,
't_shirt_size': 'm',
'train_sample_rows_test_size': None,
'include_only_estimators': None,
'include_batched_ensemble_estimators': None,
'backtest_num': None,
'lookback_window': None,
'forecast_window': None,
'backtest_gap_length': None,
'cognito_transform_names': None,
'csv_separator': ',',
'excel_sheet': None,
'encoding': 'utf-8',
'positive_label': None,
'drop_duplicates': True,
'outliers_columns': None,
'text_processing': None,
'word2vec_feature_number': None,
'daub_give_priority_to_runtime': None,
'text_columns_names': None,
'sampling_type': None,
'sample_size_limit': None,
'sample_rows_limit': None,
'sample_percentage_limit': None,
'number_of_batch_rows': None,
'n_parallel_data_connections': None,
'test_data_csv_separator': ',',
'test_data_excel_sheet': None,
'test_data_encoding': 'utf-8',
'categorical_imputation_strategy': None,
'numerical_imputation_strategy': None,
'numerical_imputation_value': None,
'imputation_threshold': None,
'retrain_on_holdout': True,
'feature_columns': None,
'pipeline_types': None,
'supporting_features_at_forecast': None,
'numerical_columns': None,
'categorical_columns': None,
'confidence_level': None,
'incremental_learning': None,
'early_stop_enabled': None,
'early_stop_window_size': None,
'time_ordered_data': None,
'feature_selector_mode': None,
'run_id': None}
3. Experiment run¶
Call the fit() method to trigger the AutoAI experiment. You can either use interactive mode (synchronous job) or background mode (asychronous job) by specifying background_model=True.
run_details = pipeline_optimizer.fit(
training_data_reference=training_data_reference, background_mode=False
)
Training job 244c385d-b6f9-4bf4-b052-8b74a14f02a0 completed: 100%|████████| [03:13<00:00, 1.94s/it]
You can use the get_run_status() method to monitor AutoAI jobs in background mode.
pipeline_optimizer.get_run_status()
'completed'
4. Pipelines comparison and testing¶
You can list trained pipelines and evaluation metrics information in
the form of a Pandas DataFrame by calling the summary() method. You can
use the DataFrame to compare all discovered pipelines and select the one
you like for further testing.
summary = pipeline_optimizer.summary()
summary
| Enhancements | Estimator | training_roc_auc_(optimized) | holdout_average_precision | holdout_log_loss | training_accuracy | holdout_roc_auc | training_balanced_accuracy | training_f1 | holdout_precision | training_average_precision | training_log_loss | holdout_recall | training_precision | holdout_accuracy | holdout_balanced_accuracy | training_recall | holdout_f1 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Pipeline Name | ||||||||||||||||||
| Pipeline_10 | HPO, FE, HPO, Ensemble | BatchedTreeEnsembleClassifier(SnapBoostingMach... | 0.852674 | 0.461366 | 0.360899 | 0.760652 | 0.864133 | 0.749397 | 0.812920 | 0.912052 | 0.914216 | 0.454383 | 0.843373 | 0.845164 | 0.841683 | 0.840848 | 0.783558 | 0.876369 |
| Pipeline_9 | HPO, FE, HPO | SnapBoostingMachineClassifier | 0.852674 | 0.461366 | 0.360899 | 0.760652 | 0.864133 | 0.749397 | 0.812920 | 0.912052 | 0.914216 | 0.454383 | 0.843373 | 0.845164 | 0.841683 | 0.840848 | 0.783558 | 0.876369 |
| Pipeline_8 | HPO, FE | SnapBoostingMachineClassifier | 0.852674 | 0.461366 | 0.360899 | 0.760652 | 0.864133 | 0.749397 | 0.812920 | 0.912052 | 0.914216 | 0.454383 | 0.843373 | 0.845164 | 0.841683 | 0.840848 | 0.783558 | 0.876369 |
| Pipeline_2 | HPO | XGBClassifier | 0.852582 | 0.468872 | 0.383217 | 0.806159 | 0.855620 | 0.752958 | 0.862457 | 0.804688 | 0.915069 | 0.428959 | 0.930723 | 0.816090 | 0.803607 | 0.740811 | 0.914433 | 0.863128 |
| Pipeline_3 | HPO, FE | XGBClassifier | 0.854128 | 0.468281 | 0.381095 | 0.808836 | 0.854556 | 0.754807 | 0.864630 | 0.801034 | 0.915529 | 0.427367 | 0.933735 | 0.816534 | 0.801603 | 0.736329 | 0.918796 | 0.862309 |
| Pipeline_4 | HPO, FE, HPO | XGBClassifier | 0.854157 | 0.469829 | 0.389012 | 0.809059 | 0.854267 | 0.756459 | 0.864440 | 0.800518 | 0.915895 | 0.428065 | 0.930723 | 0.818307 | 0.799599 | 0.734823 | 0.916111 | 0.860724 |
| Pipeline_5 | HPO, FE, HPO, Ensemble | BatchedTreeEnsembleClassifier(XGBClassifier) | 0.854157 | 0.469829 | 0.389012 | 0.809059 | 0.854267 | 0.756459 | 0.864440 | 0.800518 | 0.915895 | 0.428065 | 0.930723 | 0.818307 | 0.799599 | 0.734823 | 0.916111 | 0.860724 |
| Pipeline_1 | XGBClassifier | 0.844943 | 0.461713 | 0.332763 | 0.792997 | 0.852193 | 0.748169 | 0.850216 | 0.842246 | 0.910782 | 0.445344 | 0.948795 | 0.818789 | 0.847695 | 0.797751 | 0.884229 | 0.892351 | |
| Pipeline_6 | SnapBoostingMachineClassifier | 0.847776 | 0.463869 | 0.381273 | 0.752623 | 0.850786 | 0.742863 | 0.805745 | 0.896104 | 0.912415 | 0.460352 | 0.831325 | 0.842254 | 0.823647 | 0.819854 | 0.772486 | 0.862500 | |
| Pipeline_7 | HPO | SnapBoostingMachineClassifier | 0.847776 | 0.463869 | 0.381273 | 0.752623 | 0.850786 | 0.742863 | 0.805745 | 0.896104 | 0.912415 | 0.460352 | 0.831325 | 0.842254 | 0.823647 | 0.819854 | 0.772486 | 0.862500 |
You can visualize the scoring metric calculated on a holdout data set.
import pandas as pd
pd.options.plotting.backend = "plotly"
summary.holdout_roc_auc.plot()
Get selected pipeline model¶
Download and reconstruct a scikit-learn pipeline model object from the AutoAI training job.
best_pipeline = pipeline_optimizer.get_pipeline()
Check confusion matrix for selected pipeline.
pipeline_optimizer.get_pipeline_details()["confusion_matrix"]
| fn | fp | tn | tp | |
|---|---|---|---|---|
| true_class | ||||
| Risk | 27 | 52 | 280 | 140 |
| No Risk | 52 | 27 | 140 | 280 |
Check features importance for selected pipeline.
pipeline_optimizer.get_pipeline_details()["features_importance"]
| features_importance | |
|---|---|
| Age | 0.1532 |
| NewFeature_7_pca_2 | 0.1046 |
| LoanDuration | 0.0983 |
| NewFeature_1_nxor(LoanDuration___Age___) | 0.0974 |
| NewFeature_15_pca_18 | 0.0818 |
| NewFeature_14_pca_16 | 0.0790 |
| NewFeature_9_pca_7 | 0.0737 |
| NewFeature_16_pca_19 | 0.0681 |
| EmploymentDuration | 0.0653 |
| NewFeature_13_pca_15 | 0.0607 |
| OwnsProperty | 0.0348 |
| CurrentResidenceDuration | 0.0277 |
| CheckingStatus | 0.0239 |
| OthersOnLoan | 0.0196 |
| Telephone | 0.0061 |
| ExistingCreditsCount | 0.0056 |
Convert the pipeline model to a Python script and download it¶
from ibm_watsonx_ai.helpers import pipeline_to_script
pipeline_to_script(best_pipeline)
Visualize pipeline¶
best_pipeline.export_to_sklearn_pipeline()
Pipeline(steps=[('featureunion',
FeatureUnion(transformer_list=[('float32_transform_140028733322592',
Pipeline(steps=[('numpycolumnselector',
NumpyColumnSelector(columns=[0,
1,
2,
3,
5,
6,
7,
8,
9,
10,
11,
12,
13,
14,
15,
16,
17,
18,
19])),
('compressstrings',
CompressStrings(compress_type='hash',
dtypes_list=['char_str',
'int_num',
'char_str',
'char_str',
'char_str',
'char_st...
autoai_libs.cognito.transforms.transform_utils.TAM(tans_class=sklearn.decomposition._pca.PCA(copy = True, iterated_power = 'auto', n_components = None, n_oversamples = 10, power_iteration_normalizer = 'auto', random_state = None, svd_solver = 'full', tol = 0.0, whiten = False), name = 'pca', tgraph = None, apply_all = True, col_names = ['CheckingStatus', 'LoanDuration', 'CreditHistory', 'LoanPurpose', 'LoanAmount', 'ExistingSavings', 'EmploymentDuration', 'InstallmentPercent', 'Sex', 'OthersOnLoan', 'CurrentResidenceDuration', 'OwnsProperty', 'Age', 'InstallmentPlans', 'Housing', 'ExistingCreditsCount', 'Job', 'Dependents', 'Telephone', 'ForeignWorker', 'nxor(LoanDuration___LoanAmount___)', 'nxor(LoanDuration___Age___)', 'nxor(LoanAmount___LoanDuration___)', 'nxor(LoanAmount___Age___)', 'nxor(Age___LoanDuration___)', 'nxor(Age___LoanAmount___)'], col_dtypes = [dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32')], col_as_json_objects = None)),
('fs1-2',
autoai_libs.cognito.transforms.transform_utils.FS1(cols_ids_must_keep = range(0, 20), additional_col_count_to_keep = 20, ptype = 'classification')),
('batchedtreeensembleclassifier',
BatchedTreeEnsembleClassifier(base_ensemble=SnapBoostingMachineClassifier(class_weight='balanced',
random_state=33),
max_sub_ensembles=1))])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Pipeline(steps=[('featureunion',
FeatureUnion(transformer_list=[('float32_transform_140028733322592',
Pipeline(steps=[('numpycolumnselector',
NumpyColumnSelector(columns=[0,
1,
2,
3,
5,
6,
7,
8,
9,
10,
11,
12,
13,
14,
15,
16,
17,
18,
19])),
('compressstrings',
CompressStrings(compress_type='hash',
dtypes_list=['char_str',
'int_num',
'char_str',
'char_str',
'char_str',
'char_st...
autoai_libs.cognito.transforms.transform_utils.TAM(tans_class=sklearn.decomposition._pca.PCA(copy = True, iterated_power = 'auto', n_components = None, n_oversamples = 10, power_iteration_normalizer = 'auto', random_state = None, svd_solver = 'full', tol = 0.0, whiten = False), name = 'pca', tgraph = None, apply_all = True, col_names = ['CheckingStatus', 'LoanDuration', 'CreditHistory', 'LoanPurpose', 'LoanAmount', 'ExistingSavings', 'EmploymentDuration', 'InstallmentPercent', 'Sex', 'OthersOnLoan', 'CurrentResidenceDuration', 'OwnsProperty', 'Age', 'InstallmentPlans', 'Housing', 'ExistingCreditsCount', 'Job', 'Dependents', 'Telephone', 'ForeignWorker', 'nxor(LoanDuration___LoanAmount___)', 'nxor(LoanDuration___Age___)', 'nxor(LoanAmount___LoanDuration___)', 'nxor(LoanAmount___Age___)', 'nxor(Age___LoanDuration___)', 'nxor(Age___LoanAmount___)'], col_dtypes = [dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32')], col_as_json_objects = None)),
('fs1-2',
autoai_libs.cognito.transforms.transform_utils.FS1(cols_ids_must_keep = range(0, 20), additional_col_count_to_keep = 20, ptype = 'classification')),
('batchedtreeensembleclassifier',
BatchedTreeEnsembleClassifier(base_ensemble=SnapBoostingMachineClassifier(class_weight='balanced',
random_state=33),
max_sub_ensembles=1))])FeatureUnion(transformer_list=[('float32_transform_140028733322592',
Pipeline(steps=[('numpycolumnselector',
NumpyColumnSelector(columns=[0,
1,
2,
3,
5,
6,
7,
8,
9,
10,
11,
12,
13,
14,
15,
16,
17,
18,
19])),
('compressstrings',
CompressStrings(compress_type='hash',
dtypes_list=['char_str',
'int_num',
'char_str',
'char_str',
'char_str',
'char_str',
'int_num',
'char_str',
'char_st...
NumpyColumnSelector(columns=[4])),
('floatstr2float',
FloatStr2Float(dtypes_list=['int_num'],
missing_values_reference_list=[])),
('numpyreplacemissingvalues',
NumpyReplaceMissingValues(missing_values=[])),
('numimputer',
NumImputer(missing_values=nan,
strategy='median')),
('optstandardscaler',
OptStandardScaler(use_scaler_flag=False)),
('float32_transform',
float32_transform())]))])NumpyColumnSelector(columns=[0, 1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
16, 17, 18, 19])CompressStrings(compress_type='hash',
dtypes_list=['char_str', 'int_num', 'char_str', 'char_str',
'char_str', 'char_str', 'int_num', 'char_str',
'char_str', 'int_num', 'char_str', 'int_num',
'char_str', 'char_str', 'int_num', 'char_str',
'int_num', 'char_str', 'char_str'],
missing_values_reference_list=['', '-', '?', nan],
misslist_list=[[], [], [], [], [], [], [], [], [], [], [], [],
[], [], [], [], [], [], []])NumpyReplaceMissingValues(missing_values=[])
NumpyReplaceUnknownValues(filling_values=nan,
filling_values_list=[nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan,
nan],
known_values_list=[[227259264688753646810077375790908286508,
253732214910815238134509288111402486722,
303819144345098626554456011496217223575,
280353606872939388614315901186094326949],
[4, 5, 6, 7, 8, 9, 10, 11, 12, 13,
1...
[328286527295663582663365503319902632676,
119641707607939038914465000864290288880,
283364312271660996400883763491949419861,
27741019508977055807423991753468819528],
[1, 2],
[68186749286663113704472210246844540664,
220736790854050750400968561922076059550],
[169662019754859674907370307324476606919,
220736790854050750400968561922076059550]],
missing_values_reference_list=['', '-', '?', nan])boolean2float()
CatImputer(missing_values=nan, strategy='most_frequent')
CatEncoder(categories='auto', dtype=<class 'numpy.float64'>, encoding='ordinal',
handle_unknown='error')float32_transform()
NumpyColumnSelector(columns=[4])
FloatStr2Float(dtypes_list=['int_num'], missing_values_reference_list=[])
NumpyReplaceMissingValues(missing_values=[])
NumImputer(missing_values=nan, strategy='median')
OptStandardScaler(use_scaler_flag=False)
float32_transform()
NumpyPermuteArray(axis=0,
permutation_indices=[0, 1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 12,
13, 14, 15, 16, 17, 18, 19, 4])autoai_libs.cognito.transforms.transform_utils.TGen(fun = <class 'autoai_libs.cognito.transforms.transform_extras.NXOR'>, name = 'nxor', arg_count = 2, datatypes_list = [['numeric'], ['numeric']], feat_constraints_list = [[<cyfunction is_not_categorical at 0x7f5b29a486c0>], [<cyfunction is_not_categorical at 0x7f5b29a486c0>]], tgraph = None, apply_all = True, col_names = ['CheckingStatus', 'LoanDuration', 'CreditHistory', 'LoanPurpose', 'LoanAmount', 'ExistingSavings', 'EmploymentDuration', 'InstallmentPercent', 'Sex', 'OthersOnLoan', 'CurrentResidenceDuration', 'OwnsProperty', 'Age', 'InstallmentPlans', 'Housing', 'ExistingCreditsCount', 'Job', 'Dependents', 'Telephone', 'ForeignWorker'], col_dtypes = [dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32')], col_as_json_objects = None)autoai_libs.cognito.transforms.transform_utils.FS1(cols_ids_must_keep = range(0, 20), additional_col_count_to_keep = 20, ptype = 'classification')
autoai_libs.cognito.transforms.transform_utils.TAM(tans_class=sklearn.decomposition._pca.PCA(copy = True, iterated_power = 'auto', n_components = None, n_oversamples = 10, power_iteration_normalizer = 'auto', random_state = None, svd_solver = 'full', tol = 0.0, whiten = False), name = 'pca', tgraph = None, apply_all = True, col_names = ['CheckingStatus', 'LoanDuration', 'CreditHistory', 'LoanPurpose', 'LoanAmount', 'ExistingSavings', 'EmploymentDuration', 'InstallmentPercent', 'Sex', 'OthersOnLoan', 'CurrentResidenceDuration', 'OwnsProperty', 'Age', 'InstallmentPlans', 'Housing', 'ExistingCreditsCount', 'Job', 'Dependents', 'Telephone', 'ForeignWorker', 'nxor(LoanDuration___LoanAmount___)', 'nxor(LoanDuration___Age___)', 'nxor(LoanAmount___LoanDuration___)', 'nxor(LoanAmount___Age___)', 'nxor(Age___LoanDuration___)', 'nxor(Age___LoanAmount___)'], col_dtypes = [dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32')], col_as_json_objects = None)PCA(svd_solver='full')
PCA(svd_solver='full')
autoai_libs.cognito.transforms.transform_utils.FS1(cols_ids_must_keep = range(0, 20), additional_col_count_to_keep = 20, ptype = 'classification')
BatchedTreeEnsembleClassifier(base_ensemble=SnapBoostingMachineClassifier(class_weight='balanced',
random_state=33),
max_sub_ensembles=1)SnapBoostingMachineClassifier(class_weight='balanced', random_state=33)
SnapBoostingMachineClassifier(class_weight='balanced', random_state=33)
Each node in the visualization is a machine-learning operator (transformer or estimator). Each edge indicates data flow (transformed output from one operator becomes input to the next). The input to the root nodes is the initial dataset and the output from the sink node is the final prediction. When you hover the mouse pointer over a node, a tooltip shows you the configuration arguments of the corresponding operator (tuned hyperparameters). When you click on the hyperlink of a node, it brings you to a documentation page for the operator.
Pipeline source code¶
best_pipeline.pretty_print(ipython_display=True)
from autoai_libs.transformers.exportable import NumpyColumnSelector
from autoai_libs.transformers.exportable import CompressStrings
from autoai_libs.transformers.exportable import NumpyReplaceMissingValues
from autoai_libs.transformers.exportable import NumpyReplaceUnknownValues
from autoai_libs.transformers.exportable import boolean2float
from autoai_libs.transformers.exportable import CatImputer
from autoai_libs.transformers.exportable import CatEncoder
import numpy as np
from autoai_libs.transformers.exportable import float32_transform
from autoai_libs.transformers.exportable import FloatStr2Float
from autoai_libs.transformers.exportable import NumImputer
from autoai_libs.transformers.exportable import OptStandardScaler
from lale.lib.rasl import ConcatFeatures
from autoai_libs.transformers.exportable import NumpyPermuteArray
from autoai_libs.cognito.transforms.transform_utils import TGen
import autoai_libs.cognito.transforms.transform_extras
import autoai_libs.utils.fc_methods
from autoai_libs.cognito.transforms.transform_utils import FS1
from autoai_libs.cognito.transforms.transform_utils import TAM
from sklearn.decomposition import PCA
from snapml import BatchedTreeEnsembleClassifier
from snapml import SnapBoostingMachineClassifier
import lale
lale.wrap_imported_operators(
[
"autoai_libs.lale.numpy_column_selector",
"autoai_libs.lale.compress_strings",
"autoai_libs.lale.numpy_replace_missing_values",
"autoai_libs.lale.numpy_replace_unknown_values",
"autoai_libs.lale.boolean2float", "autoai_libs.lale.cat_imputer",
"autoai_libs.lale.cat_encoder", "autoai_libs.lale.float32_transform",
"autoai_libs.lale.float_str2_float", "autoai_libs.lale.num_imputer",
"autoai_libs.lale.opt_standard_scaler",
"autoai_libs.lale.numpy_permute_array", "autoai_libs.lale.tgen",
"autoai_libs.lale.fs1", "autoai_libs.lale.tam",
]
)
numpy_column_selector_0 = NumpyColumnSelector(
columns=[
0, 1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,
]
)
compress_strings = CompressStrings(
compress_type="hash",
dtypes_list=[
"char_str", "int_num", "char_str", "char_str", "char_str", "char_str",
"int_num", "char_str", "char_str", "int_num", "char_str", "int_num",
"char_str", "char_str", "int_num", "char_str", "int_num", "char_str",
"char_str",
],
missing_values_reference_list=["", "-", "?", float("nan")],
misslist_list=[
[], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [],
[], [],
],
)
numpy_replace_missing_values_0 = NumpyReplaceMissingValues(
filling_values=float("nan"), missing_values=[]
)
numpy_replace_unknown_values = NumpyReplaceUnknownValues(
filling_values=float("nan"),
filling_values_list=[
float("nan"), float("nan"), float("nan"), float("nan"), float("nan"),
float("nan"), float("nan"), float("nan"), float("nan"), float("nan"),
float("nan"), float("nan"), float("nan"), float("nan"), float("nan"),
float("nan"), float("nan"), float("nan"), float("nan"),
],
missing_values_reference_list=["", "-", "?", float("nan")],
)
cat_imputer = CatImputer(
missing_values=float("nan"),
sklearn_version_family="1",
strategy="most_frequent",
)
cat_encoder = CatEncoder(
dtype=np.float64,
handle_unknown="error",
sklearn_version_family="1",
encoding="ordinal",
categories="auto",
)
numpy_column_selector_1 = NumpyColumnSelector(columns=[4])
float_str2_float = FloatStr2Float(
dtypes_list=["int_num"], missing_values_reference_list=[]
)
numpy_replace_missing_values_1 = NumpyReplaceMissingValues(
filling_values=float("nan"), missing_values=[]
)
num_imputer = NumImputer(missing_values=float("nan"), strategy="median")
opt_standard_scaler = OptStandardScaler(use_scaler_flag=False)
numpy_permute_array = NumpyPermuteArray(
axis=0,
permutation_indices=[
0, 1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 4,
],
)
t_gen = TGen(
fun=autoai_libs.cognito.transforms.transform_extras.NXOR,
name="nxor",
arg_count=2,
datatypes_list=[["numeric"], ["numeric"]],
feat_constraints_list=[
[autoai_libs.utils.fc_methods.is_not_categorical],
[autoai_libs.utils.fc_methods.is_not_categorical],
],
col_names=[
"CheckingStatus", "LoanDuration", "CreditHistory", "LoanPurpose",
"LoanAmount", "ExistingSavings", "EmploymentDuration",
"InstallmentPercent", "Sex", "OthersOnLoan",
"CurrentResidenceDuration", "OwnsProperty", "Age", "InstallmentPlans",
"Housing", "ExistingCreditsCount", "Job", "Dependents", "Telephone",
"ForeignWorker",
],
col_dtypes=[
np.dtype("float32"), np.dtype("float32"), np.dtype("float32"),
np.dtype("float32"), np.dtype("float32"), np.dtype("float32"),
np.dtype("float32"), np.dtype("float32"), np.dtype("float32"),
np.dtype("float32"), np.dtype("float32"), np.dtype("float32"),
np.dtype("float32"), np.dtype("float32"), np.dtype("float32"),
np.dtype("float32"), np.dtype("float32"), np.dtype("float32"),
np.dtype("float32"), np.dtype("float32"),
],
)
fs1_0 = FS1(
cols_ids_must_keep=range(0, 20),
additional_col_count_to_keep=20,
ptype="classification",
)
pca = PCA(svd_solver="full")
tam = TAM(
tans_class=pca,
name="pca",
col_names=[
"CheckingStatus", "LoanDuration", "CreditHistory", "LoanPurpose",
"LoanAmount", "ExistingSavings", "EmploymentDuration",
"InstallmentPercent", "Sex", "OthersOnLoan",
"CurrentResidenceDuration", "OwnsProperty", "Age", "InstallmentPlans",
"Housing", "ExistingCreditsCount", "Job", "Dependents", "Telephone",
"ForeignWorker", "nxor(LoanDuration___LoanAmount___)",
"nxor(LoanDuration___Age___)", "nxor(LoanAmount___LoanDuration___)",
"nxor(LoanAmount___Age___)", "nxor(Age___LoanDuration___)",
"nxor(Age___LoanAmount___)",
],
col_dtypes=[
np.dtype("float32"), np.dtype("float32"), np.dtype("float32"),
np.dtype("float32"), np.dtype("float32"), np.dtype("float32"),
np.dtype("float32"), np.dtype("float32"), np.dtype("float32"),
np.dtype("float32"), np.dtype("float32"), np.dtype("float32"),
np.dtype("float32"), np.dtype("float32"), np.dtype("float32"),
np.dtype("float32"), np.dtype("float32"), np.dtype("float32"),
np.dtype("float32"), np.dtype("float32"), np.dtype("float32"),
np.dtype("float32"), np.dtype("float32"), np.dtype("float32"),
np.dtype("float32"), np.dtype("float32"),
],
)
fs1_1 = FS1(
cols_ids_must_keep=range(0, 20),
additional_col_count_to_keep=20,
ptype="classification",
)
snap_boosting_machine_classifier = SnapBoostingMachineClassifier(
class_weight="balanced", gpu_ids=[0], random_state=33
)
batched_tree_ensemble_classifier = BatchedTreeEnsembleClassifier(
base_ensemble=snap_boosting_machine_classifier,
inner_lr_scaling=0.5,
max_sub_ensembles=1,
outer_lr_scaling=0.5,
)
pipeline = (
(
(
numpy_column_selector_0
>> compress_strings
>> numpy_replace_missing_values_0
>> numpy_replace_unknown_values
>> boolean2float()
>> cat_imputer
>> cat_encoder
>> float32_transform()
)
& (
numpy_column_selector_1
>> float_str2_float
>> numpy_replace_missing_values_1
>> num_imputer
>> opt_standard_scaler
>> float32_transform()
)
)
>> ConcatFeatures()
>> numpy_permute_array
>> t_gen
>> fs1_0
>> tam
>> fs1_1
>> batched_tree_ensemble_classifier
)
In the pretty-printed code, >> is the pipe combinator (dataflow
edge) and & is the and combinator (combining multiple subpipelines).
They correspond to the make_pipeline and make_union functions from
scikit-learn, respectively. If you prefer the functions, you can
instead pretty-print your pipeline with
best_pipeline.pretty_print(ipython_display=True, combinators=False).
Reading training data¶
train_df = pipeline_optimizer.get_data_connections()[0].read()
train_X = train_df.drop(["Risk"], axis=1).values
train_y = train_df.Risk.values
Test pipeline model locally¶
predicted_y = best_pipeline.predict(train_X)
predicted_y[:5]
array(['No Risk', 'No Risk', 'No Risk', 'No Risk', 'No Risk'],
dtype='<U32')
To list historical runs use method list(). You can filter runs by providing experiment name.
experiment.runs(filter="Credit Risk Prediction - AutoAI").list()
| timestamp | run_id | state | auto_pipeline_optimizer name | |
|---|---|---|---|---|
| 0 | 2025-05-22T11:07:20.924Z | 244c385d-b6f9-4bf4-b052-8b74a14f02a0 | completed | Credit Risk Prediction - AutoAI |
| 1 | 2025-05-22T08:56:25.928Z | 8e5258d0-06c8-4f3d-b955-abb684fc7b04 | completed | Credit Risk Prediction - AutoAI |
| 2 | 2025-05-20T12:25:39.332Z | 15373180-d57b-4e11-9ec1-14722ab93f68 | completed | Credit Risk Prediction - AutoAI |
| 3 | 2025-05-20T12:18:33.731Z | 1e84832e-8d18-4c5e-bbe0-39ffa25ca168 | completed | Credit Risk Prediction - AutoAI |
| 4 | 2025-05-20T12:12:46.930Z | ff4714de-3096-411d-ae02-23a7688970d9 | completed | Credit Risk Prediction - AutoAI |
| 5 | 2025-05-20T12:05:18.053Z | 1a1f11df-9ed5-4d7c-ab25-4ecffe8399be | completed | Credit Risk Prediction - AutoAI |
| 6 | 2025-05-20T11:48:51.962Z | 1261db6f-dcd4-40da-9366-554b01eae762 | completed | Credit Risk Prediction - AutoAI |
| 7 | 2025-05-20T11:25:39.867Z | 7598e9e7-3119-40bf-8ebe-f48e893b84e4 | completed | Credit Risk Prediction - AutoAI |
| 8 | 2025-05-20T11:17:48.659Z | d8ae59ca-3663-4c87-aabc-158e3d8d42e1 | completed | Credit Risk Prediction - AutoAI |
| 9 | 2025-05-20T11:13:59.065Z | ef870311-8347-4c2a-b26d-e55230c291a5 | completed | Credit Risk Prediction - AutoAI |
| 10 | 2025-05-20T10:16:37.361Z | 1db56fef-d2d5-4fc7-8c37-b61a12b8d6cb | completed | Credit Risk Prediction - AutoAI |
| 11 | 2025-05-20T10:10:37.427Z | 7b35f020-bf6c-4f1c-9c10-0a74ea0b3908 | completed | Credit Risk Prediction - AutoAI |
| 12 | 2025-05-20T10:01:03.977Z | cfcb4fc8-adbf-47db-a27c-0717ebe0bff8 | completed | Credit Risk Prediction - AutoAI |
| 13 | 2025-05-20T09:53:43.574Z | b7a8a218-4dfd-4340-a43b-55af3d996cf9 | completed | Credit Risk Prediction - AutoAI |
| 14 | 2025-05-20T09:51:28.407Z | 7cdd8e24-990b-4308-8a3a-7dd419fdce5b | completed | Credit Risk Prediction - AutoAI |
| 15 | 2025-05-20T09:46:03.492Z | 42f6fa2f-0ade-4774-9a8d-9937b07a044c | completed | Credit Risk Prediction - AutoAI |
| 16 | 2025-05-20T09:37:26.703Z | 598053af-9fb6-4f64-b3e0-08cd5339bae8 | completed | Credit Risk Prediction - AutoAI |
| 17 | 2025-05-20T09:37:10.284Z | 60ea85c3-bd37-41c2-8576-e3d4dc4721a8 | completed | Credit Risk Prediction - AutoAI |
| 18 | 2025-05-20T09:30:18.213Z | a61a1300-027e-4b74-a185-2a4f9666dd20 | failed | Credit Risk Prediction - AutoAI |
| 19 | 2025-05-20T09:15:17.649Z | 62522d10-ab2b-4aea-84f0-b1abd52271db | completed | Credit Risk Prediction - AutoAI |
To work with historical pipelines found during a particular optimizer
run, you need to first provide the run_id to select the fitted
optimizer.
Note: you can assign selected run_id to the run_id variable.
run_id = run_details["metadata"]["id"]
Get executed optimizer's configuration parameters¶
experiment.runs(filter="Credit Risk Prediction - AutoAI").get_params(run_id=run_id)
{'name': 'Credit Risk Prediction - AutoAI',
'desc': 'Sample notebook',
'prediction_type': 'binary',
'prediction_column': 'Risk',
'prediction_columns': None,
'timestamp_column_name': None,
'holdout_size': None,
'max_num_daub_ensembles': None,
't_shirt_size': 'c076e82c-b2a7-4d20-9c0f-1f0c2fdf5a24',
'include_only_estimators': None,
'cognito_transform_names': None,
'train_sample_rows_test_size': None,
'text_processing': None,
'train_sample_columns_index_list': None,
'daub_give_priority_to_runtime': None,
'positive label': None,
'incremental_learning': None,
'early_stop_enabled': None,
'early_stop_window_size': None,
'outliers_columns': None,
'numerical_columns': None,
'categorical_columns': None,
'time_ordered_data': None,
'feature_selector_mode': None,
'test_data_csv_separator': ',',
'test_data_excel_sheet': None,
'test_data_encoding': 'utf-8',
'drop_duplicates': True,
'csv_separator': ',',
'excel_sheet': None,
'encoding': 'utf-8',
'retrain_on_holdout': True,
'scoring': 'roc_auc'}
Get historical optimizer instance and training details¶
historical_opt = experiment.runs.get_optimizer(run_id)
run_details = historical_opt.get_run_details()
List trained pipelines for selected optimizer¶
historical_opt.summary()
| Enhancements | Estimator | training_roc_auc_(optimized) | holdout_average_precision | holdout_log_loss | training_accuracy | holdout_roc_auc | training_balanced_accuracy | training_f1 | holdout_precision | training_average_precision | training_log_loss | holdout_recall | training_precision | holdout_accuracy | holdout_balanced_accuracy | training_recall | holdout_f1 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Pipeline Name | ||||||||||||||||||
| Pipeline_10 | HPO, FE, HPO, Ensemble | BatchedTreeEnsembleClassifier(SnapBoostingMach... | 0.852674 | 0.461366 | 0.360899 | 0.760652 | 0.864133 | 0.749397 | 0.812920 | 0.912052 | 0.914216 | 0.454383 | 0.843373 | 0.845164 | 0.841683 | 0.840848 | 0.783558 | 0.876369 |
| Pipeline_9 | HPO, FE, HPO | SnapBoostingMachineClassifier | 0.852674 | 0.461366 | 0.360899 | 0.760652 | 0.864133 | 0.749397 | 0.812920 | 0.912052 | 0.914216 | 0.454383 | 0.843373 | 0.845164 | 0.841683 | 0.840848 | 0.783558 | 0.876369 |
| Pipeline_8 | HPO, FE | SnapBoostingMachineClassifier | 0.852674 | 0.461366 | 0.360899 | 0.760652 | 0.864133 | 0.749397 | 0.812920 | 0.912052 | 0.914216 | 0.454383 | 0.843373 | 0.845164 | 0.841683 | 0.840848 | 0.783558 | 0.876369 |
| Pipeline_2 | HPO | XGBClassifier | 0.852582 | 0.468872 | 0.383217 | 0.806159 | 0.855620 | 0.752958 | 0.862457 | 0.804688 | 0.915069 | 0.428959 | 0.930723 | 0.816090 | 0.803607 | 0.740811 | 0.914433 | 0.863128 |
| Pipeline_3 | HPO, FE | XGBClassifier | 0.854128 | 0.468281 | 0.381095 | 0.808836 | 0.854556 | 0.754807 | 0.864630 | 0.801034 | 0.915529 | 0.427367 | 0.933735 | 0.816534 | 0.801603 | 0.736329 | 0.918796 | 0.862309 |
| Pipeline_4 | HPO, FE, HPO | XGBClassifier | 0.854157 | 0.469829 | 0.389012 | 0.809059 | 0.854267 | 0.756459 | 0.864440 | 0.800518 | 0.915895 | 0.428065 | 0.930723 | 0.818307 | 0.799599 | 0.734823 | 0.916111 | 0.860724 |
| Pipeline_5 | HPO, FE, HPO, Ensemble | BatchedTreeEnsembleClassifier(XGBClassifier) | 0.854157 | 0.469829 | 0.389012 | 0.809059 | 0.854267 | 0.756459 | 0.864440 | 0.800518 | 0.915895 | 0.428065 | 0.930723 | 0.818307 | 0.799599 | 0.734823 | 0.916111 | 0.860724 |
| Pipeline_1 | XGBClassifier | 0.844943 | 0.461713 | 0.332763 | 0.792997 | 0.852193 | 0.748169 | 0.850216 | 0.842246 | 0.910782 | 0.445344 | 0.948795 | 0.818789 | 0.847695 | 0.797751 | 0.884229 | 0.892351 | |
| Pipeline_6 | SnapBoostingMachineClassifier | 0.847776 | 0.463869 | 0.381273 | 0.752623 | 0.850786 | 0.742863 | 0.805745 | 0.896104 | 0.912415 | 0.460352 | 0.831325 | 0.842254 | 0.823647 | 0.819854 | 0.772486 | 0.862500 | |
| Pipeline_7 | HPO | SnapBoostingMachineClassifier | 0.847776 | 0.463869 | 0.381273 | 0.752623 | 0.850786 | 0.742863 | 0.805745 | 0.896104 | 0.912415 | 0.460352 | 0.831325 | 0.842254 | 0.823647 | 0.819854 | 0.772486 | 0.862500 |
Get selected pipeline and test locally¶
hist_pipeline = historical_opt.get_pipeline(pipeline_name="Pipeline_3")
predicted_y = hist_pipeline.predict(train_X)
predicted_y[:5]
array(['No Risk', 'No Risk', 'No Risk', 'No Risk', 'No Risk'],
dtype=object)
6. Pipeline refinement with Lale and testing¶
In this section you learn how to refine and retrain the best
pipeline returned by AutoAI. There are many ways to refine a pipeline.
For illustration, simply replace the final estimator in the
pipeline by an interpretable model. The call to
wrap_imported_operators() augments scikit-learn operators with
schemas for hyperparameter tuning.
from sklearn.linear_model import LogisticRegression as LR
from sklearn.tree import DecisionTreeClassifier as Tree
from sklearn.neighbors import KNeighborsClassifier as KNN
from lale.lib.lale import Hyperopt
from lale import wrap_imported_operators
wrap_imported_operators()
Pipeline decomposition and new definition¶
Start by removing the last step of the pipeline, i.e., the final estimator.
prefix = hist_pipeline.remove_last().freeze_trainable()
prefix.export_to_sklearn_pipeline()
Pipeline(steps=[('featureunion',
FeatureUnion(transformer_list=[('float32_transform_140028747820080',
Pipeline(steps=[('numpycolumnselector',
NumpyColumnSelector(columns=[0,
1,
2,
3,
5,
6,
7,
8,
9,
10,
11,
12,
13,
14,
15,
16,
17,
18,
19])),
('compressstrings',
CompressStrings(compress_type='hash',
dtypes_list=['char_str',
'int_num',
'char_str',
'char_str',
'char_str',
'char_st...
autoai_libs.cognito.transforms.transform_utils.TA2(fun = numpy.add, name = 'sum', datatypes1 = ['intc', 'intp', 'int_', 'uint8', 'uint16', 'uint32', 'uint64', 'int8', 'int16', 'int32', 'int64', 'short', 'long', 'longlong', 'float16', 'float32', 'float64'], feat_constraints1 = [<cyfunction is_not_categorical at 0x7f5b29a486c0>], datatypes2 = ['intc', 'intp', 'int_', 'uint8', 'uint16', 'uint32', 'uint64', 'int8', 'int16', 'int32', 'int64', 'short', 'long', 'longlong', 'float16', 'float32', 'float64'], feat_constraints2 = [<cyfunction is_not_categorical at 0x7f5b29a486c0>], tgraph = None, apply_all = True, col_names = ['CheckingStatus', 'LoanDuration', 'CreditHistory', 'LoanPurpose', 'LoanAmount', 'ExistingSavings', 'EmploymentDuration', 'InstallmentPercent', 'Sex', 'OthersOnLoan', 'CurrentResidenceDuration', 'OwnsProperty', 'Age', 'InstallmentPlans', 'Housing', 'ExistingCreditsCount', 'Job', 'Dependents', 'Telephone', 'ForeignWorker'], col_dtypes = [dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32')], col_as_json_objects = None)),
('fs1',
autoai_libs.cognito.transforms.transform_utils.FS1(cols_ids_must_keep = range(0, 20), additional_col_count_to_keep = 20, ptype = 'classification'))])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Pipeline(steps=[('featureunion',
FeatureUnion(transformer_list=[('float32_transform_140028747820080',
Pipeline(steps=[('numpycolumnselector',
NumpyColumnSelector(columns=[0,
1,
2,
3,
5,
6,
7,
8,
9,
10,
11,
12,
13,
14,
15,
16,
17,
18,
19])),
('compressstrings',
CompressStrings(compress_type='hash',
dtypes_list=['char_str',
'int_num',
'char_str',
'char_str',
'char_str',
'char_st...
autoai_libs.cognito.transforms.transform_utils.TA2(fun = numpy.add, name = 'sum', datatypes1 = ['intc', 'intp', 'int_', 'uint8', 'uint16', 'uint32', 'uint64', 'int8', 'int16', 'int32', 'int64', 'short', 'long', 'longlong', 'float16', 'float32', 'float64'], feat_constraints1 = [<cyfunction is_not_categorical at 0x7f5b29a486c0>], datatypes2 = ['intc', 'intp', 'int_', 'uint8', 'uint16', 'uint32', 'uint64', 'int8', 'int16', 'int32', 'int64', 'short', 'long', 'longlong', 'float16', 'float32', 'float64'], feat_constraints2 = [<cyfunction is_not_categorical at 0x7f5b29a486c0>], tgraph = None, apply_all = True, col_names = ['CheckingStatus', 'LoanDuration', 'CreditHistory', 'LoanPurpose', 'LoanAmount', 'ExistingSavings', 'EmploymentDuration', 'InstallmentPercent', 'Sex', 'OthersOnLoan', 'CurrentResidenceDuration', 'OwnsProperty', 'Age', 'InstallmentPlans', 'Housing', 'ExistingCreditsCount', 'Job', 'Dependents', 'Telephone', 'ForeignWorker'], col_dtypes = [dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32')], col_as_json_objects = None)),
('fs1',
autoai_libs.cognito.transforms.transform_utils.FS1(cols_ids_must_keep = range(0, 20), additional_col_count_to_keep = 20, ptype = 'classification'))])FeatureUnion(transformer_list=[('float32_transform_140028747820080',
Pipeline(steps=[('numpycolumnselector',
NumpyColumnSelector(columns=[0,
1,
2,
3,
5,
6,
7,
8,
9,
10,
11,
12,
13,
14,
15,
16,
17,
18,
19])),
('compressstrings',
CompressStrings(compress_type='hash',
dtypes_list=['char_str',
'int_num',
'char_str',
'char_str',
'char_str',
'char_str',
'int_num',
'char_str',
'char_st...
NumpyColumnSelector(columns=[4])),
('floatstr2float',
FloatStr2Float(dtypes_list=['int_num'],
missing_values_reference_list=[])),
('numpyreplacemissingvalues',
NumpyReplaceMissingValues(missing_values=[])),
('numimputer',
NumImputer(missing_values=nan,
strategy='median')),
('optstandardscaler',
OptStandardScaler(use_scaler_flag=False)),
('float32_transform',
float32_transform())]))])NumpyColumnSelector(columns=[0, 1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
16, 17, 18, 19])CompressStrings(compress_type='hash',
dtypes_list=['char_str', 'int_num', 'char_str', 'char_str',
'char_str', 'char_str', 'int_num', 'char_str',
'char_str', 'int_num', 'char_str', 'int_num',
'char_str', 'char_str', 'int_num', 'char_str',
'int_num', 'char_str', 'char_str'],
missing_values_reference_list=['', '-', '?', nan],
misslist_list=[[], [], [], [], [], [], [], [], [], [], [], [],
[], [], [], [], [], [], []])NumpyReplaceMissingValues(missing_values=[])
NumpyReplaceUnknownValues(filling_values=nan,
filling_values_list=[nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan,
nan],
known_values_list=[[227259264688753646810077375790908286508,
253732214910815238134509288111402486722,
303819144345098626554456011496217223575,
280353606872939388614315901186094326949],
[4, 5, 6, 7, 8, 9, 10, 11, 12, 13,
1...
[328286527295663582663365503319902632676,
119641707607939038914465000864290288880,
283364312271660996400883763491949419861,
27741019508977055807423991753468819528],
[1, 2],
[68186749286663113704472210246844540664,
220736790854050750400968561922076059550],
[169662019754859674907370307324476606919,
220736790854050750400968561922076059550]],
missing_values_reference_list=['', '-', '?', nan])boolean2float()
CatImputer(missing_values=nan, strategy='most_frequent')
CatEncoder(categories='auto', dtype=<class 'numpy.float64'>, encoding='ordinal',
handle_unknown='error')float32_transform()
NumpyColumnSelector(columns=[4])
FloatStr2Float(dtypes_list=['int_num'], missing_values_reference_list=[])
NumpyReplaceMissingValues(missing_values=[])
NumImputer(missing_values=nan, strategy='median')
OptStandardScaler(use_scaler_flag=False)
float32_transform()
NumpyPermuteArray(axis=0,
permutation_indices=[0, 1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 12,
13, 14, 15, 16, 17, 18, 19, 4])autoai_libs.cognito.transforms.transform_utils.TA2(fun = numpy.add, name = 'sum', datatypes1 = ['intc', 'intp', 'int_', 'uint8', 'uint16', 'uint32', 'uint64', 'int8', 'int16', 'int32', 'int64', 'short', 'long', 'longlong', 'float16', 'float32', 'float64'], feat_constraints1 = [<cyfunction is_not_categorical at 0x7f5b29a486c0>], datatypes2 = ['intc', 'intp', 'int_', 'uint8', 'uint16', 'uint32', 'uint64', 'int8', 'int16', 'int32', 'int64', 'short', 'long', 'longlong', 'float16', 'float32', 'float64'], feat_constraints2 = [<cyfunction is_not_categorical at 0x7f5b29a486c0>], tgraph = None, apply_all = True, col_names = ['CheckingStatus', 'LoanDuration', 'CreditHistory', 'LoanPurpose', 'LoanAmount', 'ExistingSavings', 'EmploymentDuration', 'InstallmentPercent', 'Sex', 'OthersOnLoan', 'CurrentResidenceDuration', 'OwnsProperty', 'Age', 'InstallmentPlans', 'Housing', 'ExistingCreditsCount', 'Job', 'Dependents', 'Telephone', 'ForeignWorker'], col_dtypes = [dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32')], col_as_json_objects = None)autoai_libs.cognito.transforms.transform_utils.FS1(cols_ids_must_keep = range(0, 20), additional_col_count_to_keep = 20, ptype = 'classification')
Next, add a new final step, which consists of a choice of three
estimators. In this code, | is the or combinator (algorithmic
choice). It defines a search space for another optimizer run.
new_pipeline = prefix >> (LR | Tree | KNN)
New optimizer Hyperopt configuration and training¶
To automatically select the algorithm and tune its hyperparameters, we
create an instance of the Hyperopt optimizer and fit it to the
data.
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(
train_X, train_y, test_size=0.15, random_state=33
)
hyperopt = Hyperopt(estimator=new_pipeline, cv=3, max_evals=20, scoring="roc_auc")
hyperopt_pipelines = hyperopt.fit(X_train, y_train)
100%|██████████| 20/20 [00:25<00:00, 1.25s/trial, best loss: -0.8329758851551331]
pipeline_model = hyperopt_pipelines.get_pipeline()
Pipeline model tests and visualization¶
from sklearn.metrics import roc_auc_score
predicted_y = pipeline_model.predict(X_test)
score = roc_auc_score(predicted_y == "Risk", y_test == "Risk")
print(f"roc_auc_score {score:.1%}")
roc_auc_score 73.9%
pipeline_model.export_to_sklearn_pipeline()
Pipeline(steps=[('featureunion',
FeatureUnion(transformer_list=[('float32_transform_140028690563632',
Pipeline(steps=[('numpycolumnselector',
NumpyColumnSelector(columns=[0,
1,
2,
3,
5,
6,
7,
8,
9,
10,
11,
12,
13,
14,
15,
16,
17,
18,
19])),
('compressstrings',
CompressStrings(compress_type='hash',
dtypes_list=['char_str',
'int_num',
'char_str',
'char_str',
'char_str',
'char_st...
autoai_libs.cognito.transforms.transform_utils.TA2(fun = numpy.add, name = 'sum', datatypes1 = ['intc', 'intp', 'int_', 'uint8', 'uint16', 'uint32', 'uint64', 'int8', 'int16', 'int32', 'int64', 'short', 'long', 'longlong', 'float16', 'float32', 'float64'], feat_constraints1 = [<cyfunction is_not_categorical at 0x7f5b29a486c0>], datatypes2 = ['intc', 'intp', 'int_', 'uint8', 'uint16', 'uint32', 'uint64', 'int8', 'int16', 'int32', 'int64', 'short', 'long', 'longlong', 'float16', 'float32', 'float64'], feat_constraints2 = [<cyfunction is_not_categorical at 0x7f5b29a486c0>], tgraph = None, apply_all = True, col_names = ['CheckingStatus', 'LoanDuration', 'CreditHistory', 'LoanPurpose', 'LoanAmount', 'ExistingSavings', 'EmploymentDuration', 'InstallmentPercent', 'Sex', 'OthersOnLoan', 'CurrentResidenceDuration', 'OwnsProperty', 'Age', 'InstallmentPlans', 'Housing', 'ExistingCreditsCount', 'Job', 'Dependents', 'Telephone', 'ForeignWorker'], col_dtypes = [dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32')], col_as_json_objects = None)),
('fs1',
autoai_libs.cognito.transforms.transform_utils.FS1(cols_ids_must_keep = range(0, 20), additional_col_count_to_keep = 20, ptype = 'classification')),
('logisticregression',
LogisticRegression(intercept_scaling=0.572728119007886,
max_iter=166, solver='liblinear',
tol=0.0007366195178949867))])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Pipeline(steps=[('featureunion',
FeatureUnion(transformer_list=[('float32_transform_140028690563632',
Pipeline(steps=[('numpycolumnselector',
NumpyColumnSelector(columns=[0,
1,
2,
3,
5,
6,
7,
8,
9,
10,
11,
12,
13,
14,
15,
16,
17,
18,
19])),
('compressstrings',
CompressStrings(compress_type='hash',
dtypes_list=['char_str',
'int_num',
'char_str',
'char_str',
'char_str',
'char_st...
autoai_libs.cognito.transforms.transform_utils.TA2(fun = numpy.add, name = 'sum', datatypes1 = ['intc', 'intp', 'int_', 'uint8', 'uint16', 'uint32', 'uint64', 'int8', 'int16', 'int32', 'int64', 'short', 'long', 'longlong', 'float16', 'float32', 'float64'], feat_constraints1 = [<cyfunction is_not_categorical at 0x7f5b29a486c0>], datatypes2 = ['intc', 'intp', 'int_', 'uint8', 'uint16', 'uint32', 'uint64', 'int8', 'int16', 'int32', 'int64', 'short', 'long', 'longlong', 'float16', 'float32', 'float64'], feat_constraints2 = [<cyfunction is_not_categorical at 0x7f5b29a486c0>], tgraph = None, apply_all = True, col_names = ['CheckingStatus', 'LoanDuration', 'CreditHistory', 'LoanPurpose', 'LoanAmount', 'ExistingSavings', 'EmploymentDuration', 'InstallmentPercent', 'Sex', 'OthersOnLoan', 'CurrentResidenceDuration', 'OwnsProperty', 'Age', 'InstallmentPlans', 'Housing', 'ExistingCreditsCount', 'Job', 'Dependents', 'Telephone', 'ForeignWorker'], col_dtypes = [dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32')], col_as_json_objects = None)),
('fs1',
autoai_libs.cognito.transforms.transform_utils.FS1(cols_ids_must_keep = range(0, 20), additional_col_count_to_keep = 20, ptype = 'classification')),
('logisticregression',
LogisticRegression(intercept_scaling=0.572728119007886,
max_iter=166, solver='liblinear',
tol=0.0007366195178949867))])FeatureUnion(transformer_list=[('float32_transform_140028690563632',
Pipeline(steps=[('numpycolumnselector',
NumpyColumnSelector(columns=[0,
1,
2,
3,
5,
6,
7,
8,
9,
10,
11,
12,
13,
14,
15,
16,
17,
18,
19])),
('compressstrings',
CompressStrings(compress_type='hash',
dtypes_list=['char_str',
'int_num',
'char_str',
'char_str',
'char_str',
'char_str',
'int_num',
'char_str',
'char_st...
NumpyColumnSelector(columns=[4])),
('floatstr2float',
FloatStr2Float(dtypes_list=['int_num'],
missing_values_reference_list=[])),
('numpyreplacemissingvalues',
NumpyReplaceMissingValues(missing_values=[])),
('numimputer',
NumImputer(missing_values=nan,
strategy='median')),
('optstandardscaler',
OptStandardScaler(use_scaler_flag=False)),
('float32_transform',
float32_transform())]))])NumpyColumnSelector(columns=[0, 1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
16, 17, 18, 19])CompressStrings(compress_type='hash',
dtypes_list=['char_str', 'int_num', 'char_str', 'char_str',
'char_str', 'char_str', 'int_num', 'char_str',
'char_str', 'int_num', 'char_str', 'int_num',
'char_str', 'char_str', 'int_num', 'char_str',
'int_num', 'char_str', 'char_str'],
missing_values_reference_list=['', '-', '?', nan],
misslist_list=[[], [], [], [], [], [], [], [], [], [], [], [],
[], [], [], [], [], [], []])NumpyReplaceMissingValues(missing_values=[])
NumpyReplaceUnknownValues(filling_values=nan,
filling_values_list=[nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan,
nan],
known_values_list=[[227259264688753646810077375790908286508,
253732214910815238134509288111402486722,
303819144345098626554456011496217223575,
280353606872939388614315901186094326949],
[4, 5, 6, 7, 8, 9, 10, 11, 12, 13,
1...
[328286527295663582663365503319902632676,
119641707607939038914465000864290288880,
283364312271660996400883763491949419861,
27741019508977055807423991753468819528],
[1, 2],
[68186749286663113704472210246844540664,
220736790854050750400968561922076059550],
[169662019754859674907370307324476606919,
220736790854050750400968561922076059550]],
missing_values_reference_list=['', '-', '?', nan])boolean2float()
CatImputer(missing_values=nan, strategy='most_frequent')
CatEncoder(categories='auto', dtype=<class 'numpy.float64'>, encoding='ordinal',
handle_unknown='error')float32_transform()
NumpyColumnSelector(columns=[4])
FloatStr2Float(dtypes_list=['int_num'], missing_values_reference_list=[])
NumpyReplaceMissingValues(missing_values=[])
NumImputer(missing_values=nan, strategy='median')
OptStandardScaler(use_scaler_flag=False)
float32_transform()
NumpyPermuteArray(axis=0,
permutation_indices=[0, 1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 12,
13, 14, 15, 16, 17, 18, 19, 4])autoai_libs.cognito.transforms.transform_utils.TA2(fun = numpy.add, name = 'sum', datatypes1 = ['intc', 'intp', 'int_', 'uint8', 'uint16', 'uint32', 'uint64', 'int8', 'int16', 'int32', 'int64', 'short', 'long', 'longlong', 'float16', 'float32', 'float64'], feat_constraints1 = [<cyfunction is_not_categorical at 0x7f5b29a486c0>], datatypes2 = ['intc', 'intp', 'int_', 'uint8', 'uint16', 'uint32', 'uint64', 'int8', 'int16', 'int32', 'int64', 'short', 'long', 'longlong', 'float16', 'float32', 'float64'], feat_constraints2 = [<cyfunction is_not_categorical at 0x7f5b29a486c0>], tgraph = None, apply_all = True, col_names = ['CheckingStatus', 'LoanDuration', 'CreditHistory', 'LoanPurpose', 'LoanAmount', 'ExistingSavings', 'EmploymentDuration', 'InstallmentPercent', 'Sex', 'OthersOnLoan', 'CurrentResidenceDuration', 'OwnsProperty', 'Age', 'InstallmentPlans', 'Housing', 'ExistingCreditsCount', 'Job', 'Dependents', 'Telephone', 'ForeignWorker'], col_dtypes = [dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32')], col_as_json_objects = None)autoai_libs.cognito.transforms.transform_utils.FS1(cols_ids_must_keep = range(0, 20), additional_col_count_to_keep = 20, ptype = 'classification')
LogisticRegression(intercept_scaling=0.572728119007886, max_iter=166,
solver='liblinear', tol=0.0007366195178949867)7. Deploy and Score¶
In this section you will learn how to deploy and score pipeline model as webservice and batch using WML instance.
Webservice deployment creation¶
from ibm_watsonx_ai.deployment import WebService
service = WebService(credentials, source_space_id=space_id)
service.create(
experiment_run_id=run_id,
model="Pipeline_1",
deployment_name="Credit Risk Deployment AutoAI",
)
Preparing an AutoAI Deployment... Published model uid: bb2d0f77-37a5-4446-9766-f1a63119df9d Deploying model bb2d0f77-37a5-4446-9766-f1a63119df9d using V4 client. ###################################################################################### Synchronous deployment creation for id: 'bb2d0f77-37a5-4446-9766-f1a63119df9d' started ###################################################################################### initializing Note: online_url is deprecated and will be removed in a future release. Use serving_urls instead. ..... ready ----------------------------------------------------------------------------------------------- Successfully finished deployment creation, deployment_id='f907e2b2-308a-4402-a7ac-21d529577cbd' -----------------------------------------------------------------------------------------------
Deployment object could be printed to show basic information:
print(service)
To show all available information about the deployment use the .get_params() method:
service.get_params()
Scoring of webservice¶
You can make scoring request by calling score() on deployed pipeline.
predictions = service.score(payload=train_df.drop(["Risk"], axis=1).iloc[:10])
predictions
{'predictions': [{'fields': ['prediction', 'probability'],
'values': [['No Risk', [0.9039297699928284, 0.09607024490833282]],
['No Risk', [0.8551719188690186, 0.14482809603214264]],
['No Risk', [0.821358323097229, 0.17864170670509338]],
['No Risk', [0.9926615357398987, 0.007338482886552811]],
['No Risk', [0.9375228881835938, 0.06247711926698685]],
['No Risk', [0.9628375768661499, 0.0371624194085598]],
['No Risk', [0.9962857961654663, 0.003714216174557805]],
['No Risk', [0.9755261540412903, 0.02447384223341942]],
['No Risk', [0.9485515356063843, 0.05144843831658363]],
['No Risk', [0.906017541885376, 0.09398248046636581]]]}]}
If you want to work with the web service in an external Python application, you can retrieve the service object:
- Initialize the service using
service = WebService(wml_credentials) - Get deployment_id using the
service.list()method - Get webservice object using the
service.get('deployment_id')method
After that you can call service.score() method.
Deleting deployment¶
You can delete the existing deployment by calling the service.delete() command.
To list the existing web services you can use service.list().
Batch deployment creation¶
A batch deployment processes input data from a inline data and return predictions in scoring details or processes from data asset and writes the output to a file.
batch_payload_df = train_df.drop(["Risk"], axis=1)[:5]
batch_payload_df
| CheckingStatus | LoanDuration | CreditHistory | LoanPurpose | LoanAmount | ExistingSavings | EmploymentDuration | InstallmentPercent | Sex | OthersOnLoan | CurrentResidenceDuration | OwnsProperty | Age | InstallmentPlans | Housing | ExistingCreditsCount | Job | Dependents | Telephone | ForeignWorker | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | less_0 | 18 | credits_paid_to_date | car_new | 462 | less_100 | 1_to_4 | 2 | female | none | 2 | savings_insurance | 37 | stores | own | 2 | skilled | 1 | none | yes |
| 1 | less_0 | 15 | prior_payments_delayed | furniture | 250 | less_100 | 1_to_4 | 2 | male | none | 3 | real_estate | 28 | none | own | 2 | skilled | 1 | yes | no |
| 2 | less_0 | 16 | credits_paid_to_date | vacation | 3109 | less_100 | 4_to_7 | 3 | female | none | 1 | car_other | 36 | none | own | 2 | skilled | 1 | none | yes |
| 3 | less_0 | 5 | all_credits_paid_back | car_new | 1523 | less_100 | unemployed | 2 | female | none | 2 | real_estate | 19 | none | rent | 1 | management_self-employed | 1 | none | yes |
| 4 | less_0 | 9 | all_credits_paid_back | car_used | 4302 | less_100 | 1_to_4 | 3 | male | none | 1 | car_other | 34 | none | free | 1 | skilled | 1 | none | yes |
Create batch deployment for Pipeline_2 created in AutoAI experiment with the run_id.
from ibm_watsonx_ai.deployment import Batch
service_batch = Batch(credentials, source_space_id=space_id)
service_batch.create(
experiment_run_id=run_id,
model="Pipeline_2",
deployment_name="Credit Risk Batch Deployment AutoAI",
)
Preparing an AutoAI Deployment... Published model uid: 4d31e8f2-b7d8-4f05-8d97-78ad3705e63f Deploying model 4d31e8f2-b7d8-4f05-8d97-78ad3705e63f using V4 client. ###################################################################################### Synchronous deployment creation for id: '4d31e8f2-b7d8-4f05-8d97-78ad3705e63f' started ###################################################################################### ready. ----------------------------------------------------------------------------------------------- Successfully finished deployment creation, deployment_id='c22c5228-19d7-4014-9fa0-f466f04eab66' -----------------------------------------------------------------------------------------------
Score batch deployment with inline payload as pandas DataFrame.¶
scoring_params = service_batch.run_job(payload=batch_payload_df, background_mode=False)
########################################################################## Synchronous scoring for id: '26e4bf58-2ed9-4fe1-8138-f3ffa6151a5b' started ########################################################################## queued... completed Scoring job '26e4bf58-2ed9-4fe1-8138-f3ffa6151a5b' finished successfully.
scoring_params["entity"]["scoring"].get("predictions")
[{'fields': ['prediction', 'probability'],
'values': [['No Risk', [0.8447757363319397, 0.1552242636680603]],
['No Risk', [0.9002561569213867, 0.09974387288093567]],
['No Risk', [0.8199893832206726, 0.1800106316804886]],
['No Risk', [0.9774191379547119, 0.022580860182642937]],
['No Risk', [0.9135990738868713, 0.08640092611312866]]]}]
Score batch deployment with payload as connected asset.¶
Simmilary to training use created connection in order to locate tabe in used database.
from ibm_watsonx_ai.helpers.connections import DeploymentOutputAssetLocation
batch_payload_filename = "credit_risk_batch_payload.csv"
batch_payload_df.to_csv(batch_payload_filename, index=False)
asset_details = client.data_assets.create(
name=batch_payload_filename, file_path=batch_payload_filename
)
asset_id = client.data_assets.get_id(asset_details)
payload_reference = DataConnection(data_asset_id=asset_id)
results_reference = DataConnection(
location=DeploymentOutputAssetLocation(name="batch_output_credit_risk.csv")
)
Creating data asset... SUCCESS
Run scoring job for batch deployment.
scoring_params = service_batch.run_job(
payload=[payload_reference],
output_data_reference=results_reference,
background_mode=False,
)
########################################################################## Synchronous scoring for id: '45b4a264-6bc5-4002-8e7a-46a06ac5d995' started ########################################################################## queued... completed Scoring job '45b4a264-6bc5-4002-8e7a-46a06ac5d995' finished successfully.
Deleting deployment¶
You can delete the existing deployment by calling the service_batch.delete() command.
To list the existing:
- batch services you can use
service_batch.list(), - scoring jobs you can use
service_batch.list_jobs().
8. Clean up¶
If you want to clean up all created assets:
- experiments
- trainings
- pipelines
- model definitions
- models
- functions
- deployments
please follow up this sample notebook.
9. Summary and next steps¶
You successfully completed this notebook!
You learned how to use ibm-watsonx-ai to run AutoAI experiments.
Check out our Online Documentation for more samples, tutorials, documentation, how-tos, and blog posts.
Authors¶
Lukasz Cmielowski, PhD, is an Automation Architect and Data Scientist at IBM with a track record of developing enterprise-level applications that substantially increases clients' ability to turn data into actionable knowledge.
Amadeusz Masny, Python Software Developer in watsonx.ai at IBM
Kiran Kate, Senior Software Engineer at IBM Research AI
Martin Hirzel, Research Staff Member and Manager at IBM Research AI
Jan Sołtysik, Intern in watsonx.ai
Copyright © 2020-2025 IBM. This notebook and its source code are released under the terms of the MIT License.